<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Create a custom analyzer | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-custom-analyzer.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-custom-analyzer.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-custom-analyzer.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-custom-analyzer.html" rel="nofollow" target="_blank">../en/analysis-custom-analyzer.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="configure-text-analysis.html">Configure text analysis</a></span>
»
<span class="breadcrumb-node">Create a custom analyzer</span>
</div>
<div class="navheader">
<span class="prev">
<a href="configuring-analyzers.html">« Configuring built-in analyzers</a>
</span>
<span class="next">
<a href="specify-analyzer.html">Specify an analyzer »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-custom-analyzer"></a>Create a custom analyzer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/analyzers/custom-analyzer.asciidoc">edit</a>
</h2>
</div></div></div>
<p>When the built-in analyzers do not fulfill your needs, you can create a
<code class="literal">custom</code> analyzer which uses the appropriate combination of:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
zero or more <a class="xref" href="analysis-charfilters.html" title="Character filters reference">character filters</a>
</li>
<li class="listitem">
a <a class="xref" href="analysis-tokenizers.html" title="Tokenizer reference">tokenizer</a>
</li>
<li class="listitem">
zero or more <a class="xref" href="analysis-tokenfilters.html" title="Token filter reference">token filters</a>.
</li>
</ul>
</div>
<h3>
<a id="_configuration"></a>Configuration<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/analyzers/custom-analyzer.asciidoc">edit</a>
</h3>
<p>The <code class="literal">custom</code> analyzer accepts the following parameters:</p>
<div class="informaltable">
<table border="0" cellpadding="4px">
<colgroup>
<col>
<col>
</colgroup>
<tbody valign="top">
<tr>
<td valign="top">
<p>
<code class="literal">tokenizer</code>
</p>
</td>
<td valign="top">
<p>
A built-in or customised <a class="xref" href="analysis-tokenizers.html" title="Tokenizer reference">tokenizer</a>.
(Required)
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">char_filter</code>
</p>
</td>
<td valign="top">
<p>
An optional array of built-in or customised
<a class="xref" href="analysis-charfilters.html" title="Character filters reference">character filters</a>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">filter</code>
</p>
</td>
<td valign="top">
<p>
An optional array of built-in or customised
<a class="xref" href="analysis-tokenfilters.html" title="Token filter reference">token filters</a>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">position_increment_gap</code>
</p>
</td>
<td valign="top">
<p>
When indexing an array of text values, Elasticsearch inserts a fake "gap"
between the last term of one value and the first term of the next value to
ensure that a phrase query doesn’t match two terms from different array
elements.  Defaults to <code class="literal">100</code>. See <a class="xref" href="position-increment-gap.html" title="position_increment_gap"><code class="literal">position_increment_gap</code></a> for more.
</p>
</td>
</tr>
</tbody>
</table>
</div>
<h3>
<a id="_example_configuration"></a>Example configuration<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/analyzers/custom-analyzer.asciidoc">edit</a>
</h3>
<p>Here is an example that combines the following:</p>
<div class="variablelist">
<dl class="variablelist">
<dt>
<span class="term">
Character Filter
</span>
</dt>
<dd>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-htmlstrip-charfilter.html" title="HTML strip character filter">HTML Strip Character Filter</a>
</li>
</ul>
</div>
</dd>
<dt>
<span class="term">
Tokenizer
</span>
</dt>
<dd>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-standard-tokenizer.html" title="Standard Tokenizer">Standard Tokenizer</a>
</li>
</ul>
</div>
</dd>
<dt>
<span class="term">
Token Filters
</span>
</dt>
<dd>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-lowercase-tokenfilter.html" title="Lowercase token filter">Lowercase Token Filter</a>
</li>
<li class="listitem">
<a class="xref" href="analysis-asciifolding-tokenfilter.html" title="ASCII folding token filter">ASCII-Folding Token Filter</a>
</li>
</ul>
</div>
</dd>
</dl>
</div>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom", <a id="CO371-1"></a><i class="conum" data-value="1"></i>
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Is this &lt;b&gt;déjà vu&lt;/b&gt;?"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/800.console"></div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO371-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>Setting <code class="literal">type</code> to <code class="literal">custom</code> tells Elasticsearch that we are defining a custom analyzer.
Compare this to how <a class="xref" href="configuring-analyzers.html" title="Configuring built-in analyzers">built-in analyzers can be configured</a>:
<code class="literal">type</code> will be set to the name of the built-in analyzer, like
<a class="xref" href="analysis-standard-analyzer.html" title="Standard Analyzer"><code class="literal">standard</code></a> or <a class="xref" href="analysis-simple-analyzer.html" title="Simple Analyzer"><code class="literal">simple</code></a>.</p>
</td>
</tr>
</table>
</div>
<p>The above example produces the following terms:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ is, this, deja, vu ]</pre>
</div>
<p>The previous example used tokenizer, token filters, and character filters with
their default configurations, but it is possible to create configured versions
of each and to use them in a custom analyzer.</p>
<p>Here is a more complicated example that combines the following:</p>
<div class="variablelist">
<dl class="variablelist">
<dt>
<span class="term">
Character Filter
</span>
</dt>
<dd>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-mapping-charfilter.html" title="Mapping character filter">Mapping Character Filter</a>, configured to replace <code class="literal">:)</code> with <code class="literal">_happy_</code> and <code class="literal">:(</code> with <code class="literal">_sad_</code>
</li>
</ul>
</div>
</dd>
<dt>
<span class="term">
Tokenizer
</span>
</dt>
<dd>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-pattern-tokenizer.html" title="Pattern Tokenizer">Pattern Tokenizer</a>, configured to split on punctuation characters
</li>
</ul>
</div>
</dd>
<dt>
<span class="term">
Token Filters
</span>
</dt>
<dd>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-lowercase-tokenfilter.html" title="Lowercase token filter">Lowercase Token Filter</a>
</li>
<li class="listitem">
<a class="xref" href="analysis-stop-tokenfilter.html" title="Stop token filter">Stop Token Filter</a>, configured to use the pre-defined list of English stop words
</li>
</ul>
</div>
</dd>
</dl>
</div>
<p>Here is an example:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": { <a id="CO372-1"></a><i class="conum" data-value="1"></i>
          "type": "custom",
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "punctuation",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": { <a id="CO372-2"></a><i class="conum" data-value="2"></i>
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": { <a id="CO372-3"></a><i class="conum" data-value="3"></i>
          "type": "mapping",
          "mappings": [
            ":) =&gt; _happy_",
            ":( =&gt; _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": { <a id="CO372-4"></a><i class="conum" data-value="4"></i>
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "I'm a :) person, and you?"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/801.console"></div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO372-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>Assigns the index a default custom analyzer, <code class="literal">my_custom_analyzer</code>. This
analyzer uses a custom tokenizer, character filter, and token filter that
are defined later in the request.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO372-2"><i class="conum" data-value="2"></i></a></p>
</td>
<td align="left" valign="top">
<p>Defines the custom <code class="literal">punctuation</code> tokenizer.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO372-3"><i class="conum" data-value="3"></i></a></p>
</td>
<td align="left" valign="top">
<p>Defines the custom <code class="literal">emoticons</code> character filter.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO372-4"><i class="conum" data-value="4"></i></a></p>
</td>
<td align="left" valign="top">
<p>Defines the custom <code class="literal">english_stop</code> token filter.</p>
</td>
</tr>
</table>
</div>
<p>The above example produces the following terms:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ i'm, _happy_, person, you ]</pre>
</div>
</div>
<div class="navfooter">
<span class="prev">
<a href="configuring-analyzers.html">« Configuring built-in analyzers</a>
</span>
<span class="next">
<a href="specify-analyzer.html">Specify an analyzer »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>